abstractive text summarization
InforME: Improving Informativeness of Abstractive Text Summarization With Informative Attention Guided by Named Entity Salience
Shen, Jianbin, Liang, Christy Jie, Xuan, Junyu
Abstractive text summarization is integral to the Big Data era, which demands advanced methods to turn voluminous and often long text data into concise but coherent and informative summaries for efficient human consumption. Despite significant progress, there is still room for improvement in various aspects. One such aspect is to improve informativeness. Hence, this paper proposes a novel learning approach consisting of two methods: an optimal transport-based informative attention method to improve learning focal information in reference summaries and an accumulative joint entropy reduction method on named entities to enhance informative salience. Experiment results show that our approach achieves better ROUGE scores compared to prior work on CNN/Daily Mail while having competitive results on XSum. Human evaluation of informativeness also demonstrates the better performance of our approach over a strong baseline. Further analysis gives insight into the plausible reasons underlying the evaluation results.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Asia > Middle East > Bahrain (0.04)
- (14 more...)
- Leisure & Entertainment > Sports (0.68)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
- Law (0.46)
A Split-then-Join Approach to Abstractive Summarization for Very Long Documents in a Low Resource Setting
$\texttt{BIGBIRD-PEGASUS}$ model achieves $\textit{state-of-the-art}$ on abstractive text summarization for long documents. However it's capacity still limited to maximum of $4,096$ tokens, thus caused performance degradation on summarization for very long documents. Common method to deal with the issue is to truncate the documents. In this reasearch, we'll use different approach. We'll use the pretrained $\texttt{BIGBIRD-PEGASUS}$ model by fine tuned the model on other domain dataset. First, we filter out all documents which length less than $20,000$ tokens to focus on very long documents. To prevent domain shifting problem and overfitting on transfer learning due to small dataset, we augment the dataset by splitting document-summary training pair into parts, to fit the document into $4,096$ tokens. Source code available on $\href{https://github.com/lhfazry/SPIN-summ}{https://github.com/lhfazry/SPIN-summ}$.
- Asia > Indonesia (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (5 more...)
Advancements in Natural Language Processing for Automatic Text Summarization
Jayatilleke, Nevidu, Weerasinghe, Ruvan, Senanayake, Nipuna
The substantial growth of textual content in diverse domains and platforms has led to a considerable need for Automatic Text Summarization (ATS) techniques that aid in the process of text analysis. The effectiveness of text summarization models has been significantly enhanced in a variety of technical domains because of advancements in Natural Language Processing (NLP) and Deep Learning (DL). Despite this, the process of summarizing textual information continues to be significantly constrained by the intricate writing styles of a variety of texts, which involve a range of technical complexities. Text summarization techniques can be broadly categorized into two main types: abstractive summarization and extractive summarization. Extractive summarization involves directly extracting sentences, phrases, or segments of text from the content without making any changes. On the other hand, abstractive summarization is achieved by reconstructing the sentences, phrases, or segments from the original text using linguistic analysis. Through this study, a linguistically diverse categorizations of text summarization approaches have been addressed in a constructive manner. In this paper, the authors explored existing hybrid techniques that have employed both extractive and abstractive methodologies. In addition, the pros and cons of various approaches discussed in the literature are also investigated. Furthermore, the authors conducted a comparative analysis on different techniques and matrices to evaluate the generated summaries using language generation models. This survey endeavors to provide a comprehensive overview of ATS by presenting the progression of language processing regarding this task through a breakdown of diverse systems and architectures accompanied by technical and mathematical explanations of their operations.
- Asia > Singapore (0.04)
- Asia > Sri Lanka (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- (5 more...)
- Overview (1.00)
- Research Report > Experimental Study (0.46)
Abstractive Text Summarization for Bangla Language Using NLP and Machine Learning Approaches
Miazee, Asif Ahammad, Roy, Tonmoy, Islam, Md Robiul, Safat, Yeamin
Text summarization involves reducing extensive documents to short sentences that encapsulate the essential ideas. The goal is to create a summary that effectively conveys the main points of the original text. We spend a significant amount of time each day reading the newspaper to stay informed about current events both domestically and internationally. While reading newspapers enriches our knowledge, we sometimes come across unnecessary content that isn't particularly relevant to our lives. In this paper, we introduce a neural network model designed to summarize Bangla text into concise and straightforward paragraphs, aiming for greater stability and efficiency.
- North America > United States > Utah (0.05)
- North America > United States > Virginia (0.04)
- North America > United States > Iowa (0.04)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- Overview (0.69)
- Research Report (0.64)
Survey of Pseudonymization, Abstractive Summarization & Spell Checker for Hindi and Marathi
Ransing, Rasika, Dhamaskar, Mohammed Amaan, Rajpurohit, Ayush, Dhoke, Amey, Dalvi, Sanket
India's vast linguistic diversity presents unique challenges and opportunities for technological advancement, especially in the realm of Natural Language Processing (NLP). While there has been significant progress in NLP applications for widely spoken languages, the regional languages of India, such as Marathi and Hindi, remain underserved. Research in the field of NLP for Indian regional languages is at a formative stage and holds immense significance. The paper aims to build a platform which enables the user to use various features like text anonymization, abstractive text summarization and spell checking in English, Hindi and Marathi language. The aim of these tools is to serve enterprise and consumer clients who predominantly use Indian Regional Languages.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- North America > Canada > Ontario > Toronto (0.05)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (4 more...)
- Research Report (0.64)
- Overview (0.46)
Survey on Abstractive Text Summarization: Dataset, Models, and Metrics
Nnadi, Gospel Ozioma, Bertini, Flavio
Readers and scholars often desire a concise summary (Too Long; Didn't Read - TL;DR) of texts to effectively prioritize information. However, creating document summaries is mentally taxing and time-consuming, especially considering the overwhelming volume of documents produced annually, as depicted in Figure 1 by [2], Figure 2, [3] reported over 100,000 scientific articles on the Corona virus pandemic in 2020, though these articles contain brief abstracts of the article, the sheer volume poses challenges for researchers and medical professionals in quickly extracting relevant knowledge on a specific topic. An automatically generated multi-document summarization could be valuable, providing readers with essential information and reducing the need to access original files unless refinement is necessary. Text summarization has garnered significant research attention, proving useful in search engines, news clustering, timeline generation, and various other applications. The objective of text summarization is to create a brief, coherent, factually consistent, and readable document that retains the essential information from the source document, whether it is a single or multi-document. In Single Document Summarization (SDS) only one input document is used, eliminating the need for additional processing to assess relationships between inputs. This method is suitable for summarizing standalone documents such as emails, legal contracts, financial reports and so on. The primary goal of Multi Document Summarization (MDS) is to gather information from several texts addressing the same topic, often composed at different times or representing diverse perspectives. The overarching objective is to produce information reports that are both succinct and comprehensive, consolidating varied opinions from documents that explore a topic through multiple viewpoints.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- Europe > Italy > Tuscany > Florence (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (14 more...)
- Research Report (1.00)
- Overview (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
- (2 more...)
L3Cube-MahaSum: A Comprehensive Dataset and BART Models for Abstractive Text Summarization in Marathi
Deshmukh, Pranita, Kulkarni, Nikita, Kulkarni, Sanhita, Manghani, Kareena, Joshi, Raviraj
We present the MahaSUM dataset, a large-scale collection of diverse news articles in Marathi, designed to facilitate the training and evaluation of models for abstractive summarization tas ks in Indic languages. The dataset, containing 25k samples, was create d by scraping articles from a wide range of online news sources and manuall y verifying the abstract summaries. Additionally, we train an IndicBAR T model, a variant of the BART model tailored for Indic languages, usin g the Maha-SUM dataset. We evaluate the performance of our trained mode ls on the task of abstractive summarization and demonstrate their eff ectiveness in producing high-quality summaries in Marathi. Our work cont ributes to the advancement of natural language processing research in Indic languages and provides a valuable resource for future research in this area using state-of-the-art models.
- Asia > India > Maharashtra > Pune (0.05)
- Asia > India > Tamil Nadu > Chennai (0.04)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
Abstractive Text Summarization: State of the Art, Challenges, and Improvements
Shakil, Hassan, Farooq, Ahmad, Kalita, Jugal
Specifically focusing on the landscape of abstractive text summarization, as opposed to extractive techniques, this survey presents a comprehensive overview, delving into state-of-the-art techniques, prevailing challenges, and prospective research directions. We categorize the techniques into traditional sequence-to-sequence models, pre-trained large language models, reinforcement learning, hierarchical methods, and multi-modal summarization. Unlike prior works that did not examine complexities, scalability and comparisons of techniques in detail, this review takes a comprehensive approach encompassing state-of-the-art methods, challenges, solutions, comparisons, limitations and charts out future improvements - providing researchers an extensive overview to advance abstractive summarization research. We provide vital comparison tables across techniques categorized - offering insights into model complexity, scalability and appropriate applications. The paper highlights challenges such as inadequate meaning representation, factual consistency, controllable text summarization, cross-lingual summarization, and evaluation metrics, among others. Solutions leveraging knowledge incorporation and other innovative strategies are proposed to address these challenges. The paper concludes by highlighting emerging research areas like factual inconsistency, domain-specific, cross-lingual, multilingual, and long-document summarization, as well as handling noisy data. Our objective is to provide researchers and practitioners with a structured overview of the domain, enabling them to better understand the current landscape and identify potential areas for further research and improvement.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > India > NCT > New Delhi (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (13 more...)
- Research Report > Promising Solution (1.00)
- Research Report > New Finding (1.00)
- Overview (1.00)
HeSum: a Novel Dataset for Abstractive Text Summarization in Hebrew
Paz-Argaman, Tzuf, Mondshine, Itai, Mordechai, Asaf Achi, Tsarfaty, Reut
While large language models (LLMs) excel in various natural language tasks in English, their performance in lower-resourced languages like Hebrew, especially for generative tasks such as abstractive summarization, remains unclear. The high morphological richness in Hebrew adds further challenges due to the ambiguity in sentence comprehension and the complexities in meaning construction. In this paper, we address this resource and evaluation gap by introducing HeSum, a novel benchmark specifically designed for abstractive text summarization in Modern Hebrew. HeSum consists of 10,000 article-summary pairs sourced from Hebrew news websites written by professionals. Linguistic analysis confirms HeSum's high abstractness and unique morphological challenges. We show that HeSum presents distinct difficulties for contemporary state-of-the-art LLMs, establishing it as a valuable testbed for generative language technology in Hebrew, and MRLs generative challenges in general.
- Asia > Singapore (0.04)
- Asia > Middle East > Israel (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (5 more...)
Improving Sequence-to-Sequence Models for Abstractive Text Summarization Using Meta Heuristic Approaches
Saxena, Aditya, Ranjan, Ashutosh
As human society transitions into the information age, reduction in our attention span is a contingency, and people who spend time reading lengthy news articles are decreasing rapidly and the need for succinct information is higher than ever before. Therefore, it is essential to provide a quick overview of important news by concisely summarizing the top news article and the most intuitive headline. When humans try to make summaries, they extract the essential information from the source and add useful phrases and grammatical annotations from the original extract. Humans have a unique ability to create abstractions. However, automatic summarization is a complicated problem to solve. The use of sequence-to-sequence (seq2seq) models for neural abstractive text summarization has been ascending as far as prevalence. Numerous innovative strategies have been proposed to develop the current seq2seq models further, permitting them to handle different issues like saliency, familiarity, and human lucidness and create excellent synopses. In this article, we aimed toward enhancing the present architectures and models for abstractive text summarization. The modifications have been aimed at fine-tuning hyper-parameters, attempting specific encoder-decoder combinations. We examined many experiments on an extensively used CNN/DailyMail dataset to check the effectiveness of various models.
- Asia > Myanmar (0.04)
- Asia > India > NCT > Delhi (0.04)
- North America > United States > New York (0.04)
- (3 more...)